magnum.np: a PyTorch based GPU enhanced finite difference micromagnetic simulation framework for high level development and inverse design

magnum.np is a micromagnetic finite-difference library completely based on the tensor library PyTorch. The use of such a high level library leads to a highly maintainable and extensible code base which is the ideal candidate for the investigation of novel algorithms and modeling approaches. On the other hand magnum.np benefits from the device abstraction and optimizations of PyTorch enabling the efficient execution of micromagnetic simulations on a number of computational platforms including graphics processing units and potentially Tensor processing unit systems. We demonstrate a competitive performance to state-of-the-art micromagnetic codes such as mumax3 and show how our code enables the rapid implementation of new functionality. Furthermore, handling inverse problems becomes possible by using PyTorch’s autograd feature.


Design
In contrast to many available micromagnetic codes magnum.np follows a high-level approach for easy readability, maintainance and development. The Python programming language combined with PyTorch offers a powerful environment, which allows to write high-level code, but still get competitive performance due to proper vectorization.
PyTorch 13 has been chosen as backend since it allows transparently switching between CPU and GPU without modification of the code. Also the use of single or double precision arithmethic can be switched easily (e.g. use torch.set_default_dtype). Furthermore, it offers a very flexible tensor interface, based on the the Numpy Array API. Directly using torch tensors for calculation avoids the need for custom vector classes and allows using pytorch functions without the need for any wrapping code.
As a nice benefit of using PyTorch, one can directly use inverse operations via the PyTorch's autograd feature 14 . Even the utilization of deep neural networks in combination with classical micromagnetics would become feasable 15 .
One key philosophy of the magnum.np design is to utilize few well-known libraries in order to delegate work, but keep its own code clean and compact. On the other hand we try to keep the number of dependencies as small as possible, in order to improve maintainability. As an example pyvista is used for simple reading or writing VTK files, but also offers many additional capabilities (mesh formats, visualization, etc.). Figure 1 summarizes the most important building blocks and features. The state class contains the actual state of the simulation like time t, magnetization m or in case of an Oersted field the corresponding current density j . It also contains the information about mesh and materials. The finite difference method is based on an equidistant rectangular mesh consisting of n x × n y × n z cells, with a grid spacing (�x, �y, �z) and an origin (x 0 , y 0 , z 0 ) . Thus the index set (i, j, k) is sufficient to identify an individual cell center: Internally, physical fields are stored as multi-dimensional PyTorch tensors, where one value is stored for each cell (e.g. scalar fields are stored as (n x , n y , n z , 1) tensors). Using Numpy Array API features like slicing or fancy indexing allows simple modification of the corresponding data. Furthermore, it allows the use of the same expression for constant and non-constant materials, which contains one material parameter for each cell of the mesh. This avoids additional storage in case of constant materials, without the need for independent code branches. By using overloading of the __call__ operator, it is even possible to allow time dependent material parameters in a transparent way.
It is often very useful to select sub-regions within the mesh, e.g. for defining location dependent material parameters, or evaluate the magnetization only in a part of the geometry. We call these sub-regions "domains" and they are easily represented by boolean tensors, which can be created by low-level tensor operation or by using SpatialCoordinate -a list of tensors (x, y, z) which store the physical location of each cell. Using these coordinate tensors allows to specify domains by simple analytic expressions (e.g. x 2 + y 2 < r 2 for a circle with radius r). The same coordinate tensors can also be used to parametrize magnetic configurations like vortices or skyrmions (see e.g. Listing 1 with the corresponding magnetization visualized in Fig. 2).
(1) www.nature.com/scientificreports/ The actual state can be stored by means of loggers. The ScalarLogger is able to log arbitrary scalar functions depending on the current state (e.g. average magnetization, field at a certain point, GMR signal, ...). The FieldLogger stores arbitrary field data using VTK.
Due to the very flexible interface it is also intendend to add utility function for various application cases to the magnum.np library. In many cases pre-and post-processing is already done in some high-level python scripts, which makes it possible to directly reuse those codes in magnum.np at least on CPU. In many cases time-critical routines can be easily translated into PyTorch code, which then also runs on the GPU, due to the common Numpy Array API. Examples of such utility functions which are already included within magnum.np are Voronoi mesh generators, several imaging tools for post-processing -like Lorentz Transmission Electron Microscopy(LTEM) or Magnetic Force Microscopy(MFM) -or the calculation of a dispersion relation from time-domain micromagnetic simulations.

Landau-Lifshitz-Gilbert equation
Dynamic micromagnetism is described by the Landau-Lifshitz-Gilbert equation with the reduced magnetization m , the reduced gyromagnetic ratio γ = 2.21 × 10 5 m/As , the dimensionless damping constant α , and the effective field h eff . The effective field may contain several contributions like the magnetostatic strayfield, or the quantummechanical exchange interaction (see "Field terms" section for the detailed descriptions of possible field terms).
For the solution of the Eq. (2) in time-domain most finite difference codes use explicit Runge-Kutta (RK) methods of different order. Magnum.np by default uses the Runge-Kutta-Fehlberg Method (RKF45) 16 , which uses a 4th order approximation with a 5th order error control. Explicit RK methods, are very common, due to their simplicity and they are well suited for modern GPU computing. Additionally, third party solvers can be easily added, since many libraries already provide a proper python interface. For example wrappers for Scipy (CPU-only) and TorchDiffEq solvers are provided. Those solvers include more complicated solver methods like implicit BDF 17 , which are well suited for stiff problems.
Often one is only iterested in the magnetic groundstate, in which case the LLG can be integrated with a high damping constant (and optionally without the precession term). Alternatively, the micromagnetic energy 18,19 can be minimized directly, which is often much more efficient. However, special care has to be taken since, standard conjugate gradient method may fail to produce correct results 20 .

Field terms
The following section shows some implementation details of the effective field terms. Due to the flexible interface new field terms can easily be added even without modifying the core library. All field terms which are linear in the magnetization m inherit from the LinearFieldTerm class, in order to allow a common calculation of the energy using with the corresponding (continuous) field h lin , the saturation magnetization M s , and the vacuum permeability µ 0 .
In the following several field contributions will be described including a continuous formulation as well as the used discretization. For example the discretized version of the linear field energy can be written as with the cell volume V = x y z . x i describes a discretized quantity x at the cell with index i . Some indices i will be omitted for sake of better readability (e.g. for the material parameter M s ).
Anisotropy field. Spin orbit coupling gives rise to an anisotropy field which favors the alignment of the magnetization into certain axes. Depending on the crystal structure one or more of such easy axis may be observed. E.g. material with tetragonal or hexagonal structure show a uniaxial anisotropy which gives rise the the following interaction field where K u1 and K u2 are the first and second order uniaxial anisotropy constants, respectively, and e u is the corresponding easy axis. Since the anisotropy is a local interaction, its discretization is straight forward and will be ommited. The corresponding source code is shown in Listing 2.
Since the uniaxial anistropy field is a linear field term, only the field needs to be implemented, whereas the energy is inherited from the LinearFieldTerm. Material parameters are accessed from state.material which returns the material for each cell at the time state.t. The actual field expression is very close to the mathematical formulation, which makes the code easy to ready and adapt for similar use cases.
For a cubic crystal structure the corresponding cubic anisotropy field is given by where K u1 and K u2 are the corresponding first and second order cubic anisotropy constants. m 1 , m 2 and m 3 are the magnetization components in three orthogonal principal axes.
Exchange field. The quantum mechanical exchange interaction favours the parallel alignment of neigboring spins. Variation of the micromagnetic energy gives rise the the following exchange field combined with a proper boundary condition 21 for the magnetization m , which can be expressed as The boundary condition is important for the correct treatment of the outer system boundaries, but also for interface between different materials. In general the jump of B over an interface Ŵ needs to vanish ( www.nature.com/scientificreports/ In case of an outer boundary this leads to the well-known ∂m ∂n = 0 , if no further field contibutions (like e.g. DMI) are considered.
The discretized expression of the exchange field considering spacially varying material parameters 22 is finally given by where A is the exchange constant and k is the grid-spacing in direction k. The index i = (i, j, k) indicates the cell for which the field should be evaluated, whereas the index i ± e k means the index of the next neighbor in the direction ±e k . Note that the harmonic mean of the exchange constants occurs in front of each next-neighbor difference, which makes it vanish if a cell is located on the boundary. This is important to fulfill the correct boundary conditions ∂m ∂n = 0 . In case of a homogeneous exchange constant this term simplifies to the well known expression DMI field. Due to the spin-orbit coupling some materials show an additional antisymmetric exchange interaction called Dzyaloshinskii-Moriya interaction 23,24 . A general DMI field can be written as with the DMI strength D and the DMI vectors e dmi k , which describe which components of the gradient of m contribute to which component of the corresponding field. It is assumed that e dmi −k = −e dmi k . Different kinds of DMI can be simply implemented by specifying the corresponding DMI vectors. For example the continuous interface DMI field for interface normals in z direction and DMI strength D i is given by Thus, the corresponding DMI vectors for interface DMI result in e dmi = (e y , −e x , 0) . See Table 1 for a summary of the most common DMI types.
Finally, Eq. (11) is discretized using central finite differences. For constant D i this results in where D i,k is the effective DMI coupling strength between cell i and i + e k . Similar to the case of the exchange field, the harmonic mean is used for the avarage coupling strengths: Note, that if DMI interactions are in place ∂m ∂n = 0 does no longer hold. Instead, inhomogeneous Neumann boundary conditions occur (see e.g. Eqs. 11-15 in 2 ), which leads to a coupling of exchange and DMI interaction. The exchange field could no longer be calculated independent of the DMI interaction.
However, since the Neumann boundary conditions are only approximately fulfilled due to the finite difference approximation, magnum.np uses an alternative formulation of the discrete boundary conditions that simply ignores the non-existing values on the boundary, which is consistent with the effective coupling strengths in Eq. (14). Although, this approach seems less profound, it has been used in some well-known micromagnetic simulation packages, like fidimag 5 or mumax3(openBC) 2 , and shows good agreements for many standard problems 25 . www.nature.com/scientificreports/ Demagnetization field. The dipole-dipole interaction gives rise to a long-range interaction. The integral formulation of the corresponding Maxwell equations can be represented as convolution of the magnetization with a proper demagnetization kernel N Discretization on equidistant grids results in a discrete convolution which can be efficiently solved by a Fourier method. The discrete convolution theorem combined with zero-padding of the magnetization allows to replace the convolution in real space, with a point-wise multiplication in Fourier space. The discrete version of Eq. (15) reads like and is visualized in Fig. 3 The average interaction from one cell to another can be calculated analytically using Newell's formula 26 . More information about the implementation details can be found in 27 , where the demagnetization field has been implemented using numpy.
As shown in Fig. 4 the Newell formula is prone to fluctuations if the distance of source and target cell is too large 28 . Thus, it is favourable to use Newell's formula only for the p next neighbors of a cell. For the long-range interaction one uses a simple dipole field with the magnetic moment M = V M s m for a cell volume V.
The difference of Newell-and dipole-field is also visualized in Fig. 4. Choosing p = 20 as default gives accurate results for the near-field, but avoid fluctuations to the long-range interactions. One further positive effect of using the dipole field for long-range interaction is that the setup of the demagnetization gets much faster and there is no need for caching the kernel to disk.
In case of multiple thin layers, which are not equi-distantly spaced, it is possible to only use the convolution theorem in the two lateral dimensions 3 . The asymptotic runtime in this case amounts to O(n xy log n xy n 2 z ) , where n xy are the number of cells within the lateral dimensions and n z is the number of non-equidistant layers.
True periodic boundary conditions can be used to suppress the influence of the shape anisotropy due to the global demagnetization factor. This is crucial when simulating the microstructure of magnetic materials. The differential version of the corresponding Maxwell equations can be solved efficiently by means of the Fast Fourier Transfrom, which intrinsically fulfills the proper periodic boundary conditions 29 .
Oersted field. For many applications like the optimization of spinwave excitation antennas 30,31 or spin orbit torque enabled devices 32,33 the Oersted field created by a given current density has an important influence. For continuous current density j it can be calculated by means of the Biot-Savart law Most common finite difference micromagnetic codes offer the possibility to use arbitrary external fields, but lack the ability to calculate the Oersted field directly. Fortunately, the Oersted field has a similar structure to the demagnetization field and the occuring integral equations can be solved analytically 34 . This makes it possible to consider current densities which vary in space and time, since the corresponding field can be updated at each time-step. www.nature.com/scientificreports/ As with the demagnetization field the far-field is approximated by the field of a singular current density, which avoids numerical fluctuations.
Spin-torque fields. Modern spintronic devices are based on different kinds of spin-torque fields 35,36 , which describe the interaction of the magnetization with the electron spin. An overview about models and numerical methods used to simulate spintronic devices can be found in 21 .
In general arbitrary spin torque contributions can be described by the following field with the current density j e , the reduced Planck constant , the elementary charge e, and the polarization of the electrons p . η damp and η field are material parameters which describe the amplitude of damping-and field-like torque 37 .
In case of Spin-Orbit-Torqe (SOT) η field and η damp are constant material parameters, whereas for the Spin-Transfer-Torque inside of magnetic multilayer structures those parameters additionally depend on ϑ-the angle between m and p . Expressions for the angular dependence are e.g. introduced in the original work of Slonczewski 38 or more generally in 39 .
Spin-Transfer-Torque can also occur in bulk material inside regions with high magnetization gradients like domain walls, or vortex-like structures. The following field has been proposed by Zhang and Li 40 for this case: with the reduced gyromagnetic ratio γ , the degree of nonadiabacity ξ . b is the polarization rate of the conducting electrons and can be written as with the Bohr magneton µ B , and the dimensionless polarization rate β.
The muMAG Standard Problem #5 is included in the magnum.np source code for demonstration of the Zhang-Li spin-torque.
Interlayer-exchange field. The Ruderman-Kittel-Kasuya-Yosida (RKKY) interaction 41 gives rise to an exchange coupling of the magnetic layers in multilayer structures which are separated by a non-magnetic layer. The corresponding continuous interaction energy can be written as where Ŵ is the interface between two layers with magnetizations m 1 and m 2 , respectively. J rkky is the coupling constant which oscillates with respect to the spacer layer thickness.
When discretizing the RKKY field using finite difference in many cases the spacer layer is not discretized. Instead the interaction constant J rkky is scaled by the spacer layer thickness. Additionally, one has to make sure  www.nature.com/scientificreports/ that the two layers are not coupled by the classical exchange interaction. In magnum.np the corresponding exchange field can be defined on subdomains, so there is no coupling via the interface. The magnetizations m 1 , m 2 should be evaluated directly at the interface. Since the magnetization is only available at the cell centers, most finite difference codes use a lowest order approximation which directly uses those center values. magnum.np also allows to use higher order approximations, which show significantly better convergence if partial domain walls are formed at the interface 42 .
For m 1 the following expression can be found: where m i denotes the magnetization of the cell adjacent to the interface insided of layer 1, where the field should be evaluated. m i−1 and m i−2 are its first and second next neighbor, respectively. A similar expression is given for m 2 , but indices i are replaced with the corresponding indices j of cells inside of layer 2. Finally, the discretization of the RKKY field corresponding to the energy Eq. (22) yields with the cell thickness z and the indices i and j of two adjacent cells in layer i and j. Note that the second term stems from a modified boundary condition for the classical exchange field, if higher order approximations are used.
Thermal field. Thermal fluctuation can be considered in micromagnetic simulations by adding a stochastic thermal field h th , which is characterized by with the Boltzmann constant k B , the temperature T, the dimensionless damping parameter α , the cell volume V, and the timestep t . . denotes the ensemble average. The two delta functions indicate that the thermal noise is spatially and temporally uncorrelated. The actual thermal field can then be calculated by where η i is a random vector drawn from a standard normal distribution for each time-step. When numerically integrating stochastic differential equations, a drift term can occur if not using the correct statistics within the numerical methods. Although some higher-order Runge-Kutta schemes exist, they become increasingly complex. Fortunately, it has been proven that in case of the LLG the drift term only changes the length of the magnetization, which is fixed anyway. Thus, it is possible to straight forwardly use available adaptive higher order schemes for the solution of the stochastic LLG 43 . Timings. Benchmarks of the field terms are presented in Fig. 5. The results show that for systems larger than about N = 10 6 elements, the demagnetization field is the dominating field term and it is less than a factor 2 slower than the mumax3 version. However, these timings have been performed without any low-level optimization. Instead magnum.np utilizes high-level optimization, that does not influence the simplicity of the code. For example just-in-time compilers (like PyTorch-compile, numba, nvidia-warp, etc.) are used to improve the performance of the code. For all local field contributions this works increadibly well and the resulting timings are even outperforming mumax3. Optimized timing using torch.compile of the recently published version 2.0 of PyTorch are included in Fig. 5. Unfortunately, torch.compile does not yet support complex datatypes, which prevents it from being used to calculate the demagnetization field.
In case of the demagnetization field an optimized padding for the 3D FFT which is not yet provided by PyTorch, could give some further speedup.

Examples
The  www.nature.com/scientificreports/ als of the system 45 . This simulation technique is also useful for the numerical modeling of structued Pt-layers on top of the thin-film that create a location-dependent DMI interaction as realized recently in an experimental work 46 . Listing 3 shows the material definition for the spintronic demo, where the anisotropy constant is altered in the irradiated region. A rectangular mesh with n cells and a grid spacing dx is created and integer domain-ids are read from an unstructured mesh file by means of the mesh_reader. Boolean domain arrays can then be derived and in turn be used to set location dependent material parameters, which will influences the local skyrmion densities.
A random initial magnetization is set and the default RKF45 solver is used for time-integration. Several logging capabilities allow to flexibly log scalar-and field-data to files. Custom python functions that return derived quantities, such as the Induction Map (IM) or the Lorentz Transmission Electron Microscopy (LTEM) image of the magnetization state, can simply be added as log entries. Listing 4 shows the corresponding code and the results are visualized in Fig. 6. One can see that the density of skyrmions in the irradiated region is increased significantly compared to the outside region. The lower anisotropy allows the nucleation of not only skyrmions, but also trivial type-II bubbles, and antiskyrmions 47 . Inverse design. Finding the optimal shape of magnetic components for certain applications is an essential, but quite challenging task. An automated topology optimization requires the efficient calculation of the so-called forward problem, as well as the corresponding gradients (compare e.g. 48,49 ). The following example should demonstrate how magnum.np can be used to solve inverse problems, by utilizing PyTorch's autograd mechanism.
The field created by a magnetization at a certain location x 0 should be maximized. The objective function J which should be minimized could thus be defined as J[m] = h y (x 0 ) ). The forward problem is simply an Figure 5. Benchmarking (a) demagnetization field and (b) exchange field for different system sizes N on an Intel(R) Xeon 6326 CPU @ 2.90 GHz using one NVIDIA A100 80GB GPU (CUDA Driver 11.8). An average of 10000 evaluations has been measured for each field term. Before measurent begins, 1000 warm-up loops are used to ensure that the GPU has reached its maximum performance state. Single precision arithmetics are used for comparison with mumax3. mesh = Mesh(n = (1500, 300, 1), dx = (10e-9, 10e-9, 20e-9)) # mesh domains = read_mesh(mesh, "mesh.msh", scale = 1e-9) # www.nature.com/scientificreports/ evaluation of the demagnetization field. The optimization requires the calculation of the gradient g = ∂J ∂m . The magnetization should always point in y direction, and its magnitude m y saturates at M s .
The optimal magnetization which leads to the maximum field at the evaluation point can be found by using an gradient-based optimization method (e.g. Conjugate Gradient). Since this simple example is linear, the optimal solution is found after a single iteration. Depending on the sign of the gradient the optimal magnetization within each cell is 1, if the calculated gradient is positive and 0 otherwise. Listing 5 summarizes how the gradient calculation is performed. The optimized magnetization is visualized in Fig. 7 and shows perfect agreement with the analytical result.

Conclusion
An overview of the basic design ideas of magnum.np has been given. Equations and references for the most important field contributions as well as solving methods are included for clarification. Some typical applications are provided in order to demonstrate the ease of use and the power of the provided python-base interface. Furthermore the use of PyTorch extends magnum.np's capabilities to inverse probems and allows seamlessly running applications on CPU and GPU without any modification of the code. The openness of the project should encourage other developers to contribute code and use magnum.np as a framework for the development and testing of new algorithms, while still getting reasonable performance and generality.  Listing 4:. Setup of time-integration for 20 ns and logging. Scalar data, like timet and avarage magnetization 〈m〉, will be written to a column based text field.Field data, like the magnetization m as well as a corresponding LTEM image,will be written to .vti files utilizing pyvista.